-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(celery): Prepare to run on multiple queues #19157
Conversation
Reasoning: We need to configure Celery workers in several places to consume from a specific set of queues.
Size Change: +129 B (0%) Total Size: 2 MB ℹ️ View Unchanged
|
@@ -0,0 +1,3 @@ | |||
# Default set of queues to be used by Celery. | |||
# Important: Add new queues to make Celery consume tasks from them. | |||
CELERY_WORKER_QUEUES=celery,email,insight_export,insight_refresh,gevent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If is is somehow possible to define this in code, I would 100% vote for that as otherwise I think this file will get forgotten about.
Best case would be to give ourself a change to never miss it, for example by using an enum of sorts so that we define the queues in one place. Then, if the env var isn't set, we set it to a default of all the known queues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm, not super sure what you mean. This was my attempt to define it in one place. The problem I see here is that we start Celery workers, as far as I can tell, for local and hobby deployments in two separate places. Both of these are shell scripts, hence both can use this env file. I also don't really find it nice to be honest. Do you see another way to do it? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjackwhite I'm still not sure how to immediately solve this in a better way. What I did improve though is that we can even use the env file for the Pycharm run config, so we can use this single list of all queues in all places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have been clearer with my description.
What I'm wondering is if there is anyway we can do this in python?
So for example:
class CeleryQueue(str, Enum):
DEFAULT = "celery"
EMAIL = "email"
all_queues = [e.value for e in CeleryQueue]
CELERY_WORKER_QUEUES=os.getenv("CELERY_WORKER_QUEUES", None) or all_queues
My thinking being that then we codify the default list of queues rather than relying on an external setting (that can easily be forgotten about) and it should be automatically picked up if we added a new queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it just isn't possible for one reason or another though then consider it non-blocking and we can push forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively - if celery would just default to processing from all queues anyways - why do we need this config at all 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How I see this problem is that Celery is run via shell scripts and we need to pass in the queues there (it apparently cannot consume from "all" – at least not that I found). I agree that having in Python where we also direct the tasks to the queue would be great though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Yeah if its just not possible then I guess this is the solution 😅
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
# Conflicts: # requirements.txt
# Conflicts: # requirements-dev.txt # requirements.in # requirements.txt
102c5d4
to
40b985c
Compare
Problem
We want to run separate Celery workers that consume a different set of queues each.
See RFC https://github.com/PostHog/product-internal/pull/537
Changes
How did you test this code?